Towards deeper understanding of the latent semantic analysis performance

نویسندگان

  • Preslav Nakov
  • Elena Valchanova
  • Galia Angelova
چکیده

The paper studies the factors influencing the performance of the Latent Semantic Analysis. Unlike previous related research that concentrates on parameters such as matrix elements weighting, space dimensionality, similarity measure etc., we address the impact of another fundamental factor: the definition of “word”. For the purpose, series of experiments were performed on two corpora in order to compare (with respect to the task of text categorisation) six word variants with different linguistic quality. The results show that while the linguistic processing influences the performance, the traditional factors are more important.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Deeper Understanding of the LSA Performance

The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we primarily study the impact of another, often neglected, but fundamental element of LSA (and of any text ...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Matrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent Semantic Indexing

We present a detailed analysis of matrices satisfying the so-called low-mnk-plus-shift property in connection with the computation of their partial singular value decomposition. The application we have in mind is Latent Semantic Indexing for information retrieval where the termdocument matrices generated from a text corpus approximately satisfy this property. The analysis is motivated by develo...

متن کامل

Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation

This paper proposes to perform probabilistic latent semantic analysis (PLSA) for broadcast news (BN) story segmentation. PLSA exploits a deeper underlying relation among terms beyond their occurrences thus conceptual matching can be employed to replace literal term matching. Different from text segmentation, lexical based BN story segmentation has to be carried out over LVCSR transcripts, where...

متن کامل

Document Cohesion Flow: Striving towards Coherence

Text cohesion is an important element of discourse processing. This paper presents a new approach to modeling, quantifying, and visualizing text cohesion using automated cohesion flow indices that capture semantic links among paragraphs. Cohesion flow is calculated by applying Cohesion Network Analysis, a combination of semantic distances, Latent Semantic Analysis, and Latent Dirichlet Allocati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003